Use of Classification Regression Tree in Predicting Oral Absorption in Humans

نویسندگان

  • Jane P. F. Bai
  • Andrey Utis
  • Gordon M. Crippen
  • Han-Dan He
  • Volker Fischer
  • Robert Tullman
  • He-Qun Yin
  • Cheng-Pang Hsu
  • Lan Jiang
  • Kin-Kai Hwang
چکیده

The purpose of this study is to explore the use of classification regression trees (CART) in predicting, in the dose-independent range, the fraction dose absorbed in humans. Since the results from clinical formulations in humans were used for training the model, a hypothetical state of drug molecules already dissolved in the intestinal fluid was adopted. Therefore, the molecular attributes affecting dissolution were not considered in the model. As a result, the model projects the highest achievable fraction dose absorbed, providing a reference point for manipulating the formulations or solid states to optimize oral clinical efficacy. A set of approximately 1260 structures and their human oral pharmacokinetic data, including bioavailability and/or absorption and/or radio-labeled studies, were used, with 899 compounds as the training set and 362 the test set. The numerical range of the fraction dose absorbed, 0 to 1, was divided into 6 classes with each class having a size of approximately 0.16. A set of 28 structural descriptors was used for modeling oral absorption without considering active transport. Then, a separate branch was created for modeling oral absorption involving active transport. The AAE of the training set was 0.12 and those of five test sets ranged from 0.17 to 0.2. In terms of classification, two test sets of unpublished, proprietary compounds showed 79% to 86% prediction when the predicted values fallen within +/- one class of real values were considered predicted. Overall, the computational errors from all the test sets of diverse structures were similar and reasonably acceptable. As compared to artificial membranes for ranking drug absorption potential, prediction by the CART model is considered fast and reasonably accurate for accelerating drug discovery. One can not only improve continuously the accuracy of CART computations by expanding the chemical space of the training set but also calculate the statistical errors associated with individual decision paths resulting from the training set to determine whether to accept individual computations of any test sets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting The Type of Malaria Using Classification and Regression Decision Trees

Predicting The Type of Malaria Using Classification and Regression Decision Trees Maryam Ashoori1 *, Fatemeh Hamzavi2 1School of Technical and Engineering, Higher Educational Complex of Saravan, Saravan, Iran 2School of Agriculture, Higher Educational Complex of Saravan, Saravan, Iran Abstract Background: Malaria is an infectious disease infecting 200 - 300 million people annually. Environme...

متن کامل

Logic regression and its application in predicting diseases

Regression is one of the most important statistical tools in data analysis and study of the relationship between predictive variables and the response variable. in most issues, regression models and decision tress only can show the main effects of predictor variables on the response and considering interactions between variables does not exceed of two way and ultimately three-way, due to co...

متن کامل

Prediction of melting points of a diverse chemical set using fuzzy regression tree

The classification and regression trees (CART) possess the advantage of being able to handlelarge data sets and yield readily interpretable models. In spite to these advantages, they are alsorecognized as highly unstable classifiers with respect to minor perturbations in the training data.In the other words methods present high variance. Fuzzy logic brings in an improvement in theseaspects due ...

متن کامل

Predicting Twist Condition by Bayesian Classification and Decision Tree Techniques

Railway infrastructures are among the most important national assets of countries. Most of the annual budget of infrastructure managers are spent on repairing, improving and maintaining railways. The best repair method should consider all economic and technical aspects of the problem. In recent years, data analysis of maintenance records has contributed significantly for minimizing the costs. B...

متن کامل

Supplementary Information - Identification of drug mode of action based on gene expression data: Application in Drug Induced Lung Injury

Here, an independent statistical method is applied to validate the DILD network constructed by the ILP algorithm. To this end, the GUIDE algorithm is used [1, 2]. GUIDE is an algorithm that builds a classification and regression tree model to predict the values of one or more response variables (Y1, Y2, ...) from the values of the predictor variables (X1, X2, ...). It can also produce an import...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of chemical information and computer sciences

دوره 44 6  شماره 

صفحات  -

تاریخ انتشار 2004